Exploration of Assessment Neighborhoods in the District

The FIS team and I have begun to examine the role of transformative investments in the District. Our first pass at this analysis was presented at the 2013 Fall NTA Meeting, and we need to clean a few items up for a proceedings submission. This script is intended to provide some exploratory graphics to capture some economic context over the study period.

In [289]:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import pysal as ps
import geopandas as gp
from IPython.display import HTML
from ggplot import *
from bokeh.plotting import *
from pandas.tools.plotting import scatter_matrix
import statsmodels.api as sm

#Set print width
pd.set_option('line_width',100)

The data for this exercise will come from the annual summary stats by neighborhood compiled in the following files in '/home/choct155/ORA/EconDev':

  1. edev_mean.csv
  2. edev_median.csv

First, let's read in the data.

In [290]:
#Establish working directory
workdir='/home/choct155/ORA/EconDev/'

#Read in data
mean_temp=pd.read_csv(workdir+'edev_mean.csv',index_col=['aneigh','year']).sortlevel(0)
med_temp=pd.read_csv(workdir+'edev_median.csv',index_col=['aneigh','year']).sortlevel(0)

#Slice to exclude empty neighborhoods
mean=mean_temp.ix[('16th Street Heights',2001):('Woodridge',2011)]
med=med_temp.ix[('16th Street Heights',2001):('Woodridge',2011)]

#Combine sets
stats=mean.join(med[['50%_iit','50%_prop']])

stats.head()
Out[290]:
mean_iit mean_prop exp num_bbl 50%_iit 50%_prop
aneigh year
16th Street Heights 2001 37423.540696 185900.003636 NaN 40 25567.0 148394
2002 39058.201993 192702.633166 NaN 6 26094.0 150692
2003 39203.947657 194879.192164 NaN 15 26078.5 155063
2004 42839.623425 306471.427547 NaN 33 27727.0 252520
2005 43622.986597 399701.294067 NaN 70 28930.0 336380

The above data must be augmented with spatial references. We can grab said references using the assessment neighborhoood shapefile.

In [291]:
#Read in spatial data
nbhd_temp=gp.GeoDataFrame.from_file(workdir+'as_nbhd.shp').rename(columns={'NEIGHBORHO':'aneigh'})

#Set neighborhoods as row names
nbhd=nbhd_temp.set_index('aneigh')
nbhd.head()
Out[291]:
DESCRIPTIO GIS_ID LABELB NBHD NBHD_NUM NBHD_TEXT NHDLNK OBJECTID SHAPE_AREA SHAPE_LEN geometry
aneigh
Georgetown 025 Georgetown nhood_025 None 025 25 25 025 32 0 0 POLYGON ((394168.0625003650784492 138547.57810...
Glover Park 026 Glover Park nhood_026 None 026 26 26 026 33 0 0 POLYGON ((393275.6563011109828949 139746.56249...
Hawthorne 027 Hawthorne nhood_027 None 027 27 27 027 34 0 0 POLYGON ((395475.5000040307641029 145682.45309...
Hillcrest 028 Hillcrest nhood_028 None 028 28 28 028 35 0 0 POLYGON ((403452.1249997988343239 134667.18750...
Kalorama 029 Kalorama nhood_029 None 029 29 29 029 36 0 0 POLYGON ((395912.3124985843896866 139486.20310...

Now we need to merge this information together. To make this happen, we will need to split the economic data by year, and merge it with the spatial information. All told, we will create 11 GeoDataFrames. Since we are walking before running here, let's test the 2001 case.

In [292]:
#Join 2001 economic data with spatial references
nbhd01=gp.GeoDataFrame(nbhd.join(stats.xs(2001,level='year')))
nbhd01.head()
Out[292]:
DESCRIPTIO GIS_ID LABELB NBHD NBHD_NUM NBHD_TEXT NHDLNK OBJECTID SHAPE_AREA SHAPE_LEN geometry mean_iit mean_prop exp num_bbl 50%_iit 50%_prop
aneigh
16th Street Heights 049 16th Street Heights nhood_049 None 049 49 49 049 19 0 0 POLYGON ((397753.9375005662441254 141723.29690... 37423.540696 185900.003636 NaN 40 25567 148394.0
American University 001 American University nhood_001 None 001 1 1 001 4 0 0 POLYGON ((393362.3124991357326508 141606.35940... 104302.134924 343016.255255 NaN 33 68921 295183.0
Anacostia 002 Anacostia nhood_002 None 002 2 2 002 5 0 0 POLYGON ((402129.8438016176223755 134197.10940... 22235.685076 89607.184932 NaN 82 18286 79293.5
Anacostia Park 064 Anacostia Park nhood_064 None 064 64 64 064 59 0 0 POLYGON ((402293.9999981075525284 134723.28129... NaN NaN NaN NaN NaN NaN
Barry Farms 003 Barry Farms nhood_003 None 003 3 3 003 6 0 0 POLYGON ((399873.1874952167272568 133147.24999... 18984.787792 140198.988095 NaN 2 16373 62936.5

Does it plot?

In [293]:
nbhd01.plot(column='mean_iit', scheme='QUANTILES', k=3, colormap='OrRd')
Out[293]:
<matplotlib.axes.AxesSubplot at 0x253a2410>

Doesn't look quite right. I am wondering if it's an issue with GeoPandas or the data. Do pre-merge values plot appropriately?

In [294]:
nbhd.plot(column='NBHD_NUM', scheme='QUANTILES', k=3, colormap='OrRd')
Out[294]:
<matplotlib.axes.AxesSubplot at 0x250fdc90>

They appear to do so. What do the underlying income data look like?

In [295]:
#Identify and groupby quintile thresholds
quintiles=nbhd01['mean_iit'].groupby(pd.qcut(nbhd01['mean_iit'],5))

#Find the average value by quintile
quint_means=quintiles.mean()

#Extract the upper bounds of the quintile ranges
q_up=[float(x.split(',')[1].replace(']','')) for x in quint_means.index]

#Plot distribution
plt.rcParams['figure.figsize']=18,8
nbhd01['mean_iit'][nbhd01['mean_iit'].notnull()].plot(kind='kde',linewidth=3,secondary_y=True,\
                                                      title='mean_iit Distribution w/ Quintile Upper Threshold Overlay')
nbhd01['mean_iit'][nbhd01['mean_iit'].notnull()].hist(bins=50)
plt.vlines(q_up[:-1],0,0.000006,colors='r')

#Plot average value by quintile
plt.rcParams['figure.figsize']=18,15
fig,axes=plt.subplots(2,sharex=True)
quint_means.plot(kind='bar',title='Average mean_iit Value by Quintile',ax=axes[0])

#Plot group size by quintile (validation)
quintiles.count().plot(kind='bar',title='mean_iit Count by Quintile',ax=axes[1])
Out[295]:
<matplotlib.axes.AxesSubplot at 0x4440db50>

The plot above captures the kernel density estimate of the mean_iit distribution. The upper thresholds for the first four quintile ranges have also been marked by the vertical blue lines. Nothing looks all that strange. The spacing on the quintile looks a little funny because of the relatively low number of observations (maximum of 70 per year).

Perhaps it has to do with the number of discrete categories. Let's change that up.

In [296]:
plt.rcParams['figure.figsize']=18,20
fig,axes=plt.subplots(2,2,sharex=True,sharey=True)
nbhd01.plot(column='mean_iit', scheme='QUANTILES', k=3, colormap='OrRd',axes=axes[0,0])
axes[0,0].set_title('k=3')
nbhd01.plot(column='mean_iit', scheme='QUANTILES', k=4, colormap='OrRd',axes=axes[0,1])
axes[0,1].set_title('k=4')
nbhd01.plot(column='mean_iit', scheme='QUANTILES', k=5, colormap='OrRd',axes=axes[1,0])
axes[1,0].set_title('k=5')
nbhd01.plot(column='mean_iit', scheme='QUANTILES', k=6, colormap='OrRd',axes=axes[1,1])
axes[1,1].set_title('k=6')
plt.suptitle('Distribution of mean_iit by Quantile Bin Count',fontsize=20)
Out[296]:
<matplotlib.text.Text at 0x3efea450>

I believe we have identified the source of the problem. We should be specifying quintiles with k=5. It would also be useful to see what the different classification schemes offer. In addition to 'QUANTILES', the following classification rules can be employed:

In [297]:
#Set plot size
plt.rcParams['figure.figsize']=18,9

#Create plotting object
fig,axes=plt.subplots(1,2,sharex=True,sharey=True)

#Plot Fisher-Jenks and change bg
nbhd01.plot(column='mean_iit', scheme='fisher_jenks', k=8, colormap='OrRd',axes=axes[0])
axes[0].set_title('Fisher-Jenks')
axes[0].patch.set_facecolor('white')

#Plot Equal Interval and change bg
nbhd01.plot(column='mean_iit', scheme='equal_interval', k=8, colormap='OrRd',axes=axes[1])
axes[1].set_title('Equal Interval')
axes[1].patch.set_facecolor('white')

It looks like extreme values on the high-end are probably disrupting things. For Fisher-Jenks, the scheme seeks to minimize variance within groups while maximizing variance across groups. I suspect to many observations on the low end fall into a single category in an effort to accommodate the very high variance among the highest values. For Equal Interval, the gap between the highest values and more typical values is probably consuming multiple groups. The Quantile approach looks like a winner.

Spatio-Temporal Plotting

Now we have an approach that can be used to examine these distribution plots over time. The first order of business is generating GeoDataFrames for each year.

In [298]:
#Create container for GDFs
nbhd_gdf_list=[]

#Create container for years
yr_list=[]

#For each year in 2001-2011...
for yr in range(2001,2012):
    #...grab the year...
    yr_list.append(yr)
    #...and the newly merged GDF
    nbhd_gdf_list.append(gp.GeoDataFrame(nbhd.join(stats.xs(yr,level='year'))))
    
#Capture all GDFs in a dictionary
nbhd_dict=dict(zip(yr_list,nbhd_gdf_list))

nbhd_dict[2001].head()
Out[298]:
DESCRIPTIO GIS_ID LABELB NBHD NBHD_NUM NBHD_TEXT NHDLNK OBJECTID SHAPE_AREA SHAPE_LEN geometry mean_iit mean_prop exp num_bbl 50%_iit 50%_prop
aneigh
16th Street Heights 049 16th Street Heights nhood_049 None 049 49 49 049 19 0 0 POLYGON ((397753.9375005662441254 141723.29690... 37423.540696 185900.003636 NaN 40 25567 148394.0
American University 001 American University nhood_001 None 001 1 1 001 4 0 0 POLYGON ((393362.3124991357326508 141606.35940... 104302.134924 343016.255255 NaN 33 68921 295183.0
Anacostia 002 Anacostia nhood_002 None 002 2 2 002 5 0 0 POLYGON ((402129.8438016176223755 134197.10940... 22235.685076 89607.184932 NaN 82 18286 79293.5
Anacostia Park 064 Anacostia Park nhood_064 None 064 64 64 064 59 0 0 POLYGON ((402293.9999981075525284 134723.28129... NaN NaN NaN NaN NaN NaN
Barry Farms 003 Barry Farms nhood_003 None 003 3 3 003 6 0 0 POLYGON ((399873.1874952167272568 133147.24999... 18984.787792 140198.988095 NaN 2 16373 62936.5

Now we can plot the quintile distribution over time. It would be useful to directly compare the income (adjusted gross income) and property (assessed value) data. It's difficult to see annual trends, so we will subset to three years: [2001,2006,2011]. It would also be useful to capture the difference over the 2001-2011 period in one plot.

In [299]:
#Calculate difference across the time period
dstats=stats.xs(2011,level='year')-stats.xs(2001,level='year')

#Create new GDF
nbhd_diff=gp.GeoDataFrame(nbhd.join(dstats))

nbhd_diff
Out[299]:
<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 71 entries, 16th Street Heights to Woodridge
Data columns (total 17 columns):
DESCRIPTIO    71  non-null values
GIS_ID        71  non-null values
LABELB        2  non-null values
NBHD          71  non-null values
NBHD_NUM      71  non-null values
NBHD_TEXT     71  non-null values
NHDLNK        71  non-null values
OBJECTID      71  non-null values
SHAPE_AREA    71  non-null values
SHAPE_LEN     71  non-null values
geometry      71  non-null values
mean_iit      64  non-null values
mean_prop     58  non-null values
exp           10  non-null values
num_bbl       57  non-null values
50%_iit       64  non-null values
50%_prop      58  non-null values
dtypes: float64(8), int64(2), object(7)

With the difference calculated, we can move ahead to plotting.

In [300]:
#Set plot size
plt.rcParams['figure.figsize']=18,32

#Update nbhd_dict to include the difference information
nbhd_dict['diff']=nbhd_diff

#Create list of year subset
yr_sub=[2001,2006,2011,'diff']

#Create plotting object
fig,axes=plt.subplots(len(yr_sub),2,sharex=True,sharey=True)

#For each year in 2001-2011...
for i in range(len(yr_sub)):
    #...plot the distribution of mean_iit...
    nbhd_dict[yr_sub[i]].plot(column='mean_iit', scheme='QUANTILES', k=5, colormap='Blues',axes=axes[i,0])
    axes[i,0].set_title('mean_iit in '+str(yr_sub[i]))
    axes[i,0].patch.set_facecolor('white')
    #...and the distribution of mean_prop
    nbhd_dict[yr_sub[i]].plot(column='mean_prop', scheme='QUANTILES', k=5, colormap='Blues',axes=axes[i,1])
    axes[i,1].set_title('mean_prop in '+str(yr_sub[i]))
    axes[i,1].patch.set_facecolor('white')

#Remove unused space
plt.tight_layout()

#Adjust titles on last two plots
axes[3,0].set_title('mean_iit Difference (2001-2011)')
axes[3,1].set_title('mean_prop Difference (2001-2011)')
Out[300]:
<matplotlib.text.Text at 0x4567de50>

The general eastward trend is expected. It is interesting to note, however, that most of the income growth has been captured in the middle of the city, while property growth has been greater in the more residential areas of Ward 5 and surrounding areas.

What about the median values?

In [301]:
#Create plotting object
fig,axes=plt.subplots(len(yr_sub),2,sharex=True,sharey=True)

#For each year in 2001-2011...
for i in range(len(yr_sub)):
    #...plot the distribution of mean_iit...
    nbhd_dict[yr_sub[i]].plot(column='50%_iit', scheme='QUANTILES', k=5, colormap='Blues',axes=axes[i,0])
    axes[i,0].set_title('50%_iit in '+str(yr_sub[i]))
    axes[i,0].patch.set_facecolor('white')
    #...and the distribution of mean_prop
    nbhd_dict[yr_sub[i]].plot(column='50%_prop', scheme='QUANTILES', k=5, colormap='Blues',axes=axes[i,1])
    axes[i,1].set_title('50%_prop in '+str(yr_sub[i]))
    axes[i,1].patch.set_facecolor('white')

#Remove unused space
plt.tight_layout()

#Adjust titles on last two plots
axes[3,0].set_title('50%_iit Difference (2001-2011)')
axes[3,1].set_title('50%_prop Difference (2001-2011)')
Out[301]:
<matplotlib.text.Text at 0x3e2e5710>

While average assessment values have grown in the 'Near Northeast', the median values do not display the same action. Median values did have noticeable growth along the NW/NE border, which reinforced the eastward shift. On the other hand, median incomes values cluster in the center of the city just like mean values.

What about business license activity? The autoregressive nature of this figure is reduced because it captures license growth (unlike the stock characteristics captured by adjusted gross income and assessment value). Consequently, we will look at each year over the 2001-2011 period.

In [302]:
#Set plot size
plt.rcParams['figure.figsize']=18,25

#Create plotting object
fig,axes=plt.subplots(4,3)

#For each year in 2001-2011...
for i in range(len(yr_list)):
    #...plot the distribution of mean_iit...
    nbhd_dict[yr_list[i]].plot(column='num_bbl', scheme='QUANTILES', k=5, colormap='Blues',axes=axes[i//3,i%3])
    #...set face color to white and...
    axes[i//3,i%3].patch.set_facecolor('white')
    #...set titles
    axes[i//3,i%3].set_title('num_bbl in '+str(yr_list[i]))
    #... and turn off those pesky axes
    axes[i//3,i%3].set_axis_off()

plt.tight_layout()
plt.axis('off')
Out[302]:
(0.0, 1.0, 0.0, 1.0)

Interesting, the patterns for business development do not appear to shift much over time at first blush.

Which areas have received the greatest amount of investment?

In [303]:
#Sum exp by neighborhood
exp_tot=stats['exp'].groupby(level='aneigh').sum()

#Join with GDF
nbhd['exp_tot']=exp_tot.ix[nbhd.index]

#Plot investment over the 2001-2011 time period
plt.rcParams['figure.figsize']=15,15
fig=plt.figure()
ax = fig.add_subplot(111)
ax.patch.set_facecolor('white')
nbhd.fillna(0).plot(column='exp_tot', scheme='Quantiles', k=9, colormap='Blues',axes=ax)
ax.set_title('Distribution of Total Expenditure by Neighborhood (2001-2011)')
Out[303]:
<matplotlib.text.Text at 0x56f58110>

Exploring the Growth Environment

Now that we have an idea of the dynamics of value levels, it would be interesting to explore the growth trajectory of these neighborhoods. Specifically, we want to know the following:

  1. What is the relative rate of growth across neighborhoods?
  2. Do we notice differences in growth rates by 'era'?

The latter seeks to understand shifts in growth profiles. For our purposes, 'era' refers to one of three collections of years:

  1. 2002-2004: captures the .com/9-11 effect
  2. 2005-2007: captures the back end of the .com recession/9-11
  3. 2008-2011: captures the Great Recession and recovery period

Obviously these 'eras' are not easily separated. The idea is just to provide some allowance for differing growth regimes. To construct them, we will fold the stats DF into the three periods (aggregating via mean), and merge them in with the nbhd GeoDataFrame.

In [304]:
#Create copy of stats, but capturing percentage change
stats_era=stats.pct_change().reset_index().copy()

#Create mapping dictionary for eras
era_map_dict=dict({2001:0,
               2002:0,
               2003:0,
               2004:0,
               2005:1,
               2006:1,
               2007:1,
               2008:2,
               2009:2,
               2010:2,
               2011:2})

#Recode years to eras
stats_era['era']=stats_era['year'].map(era_map_dict)

#Groupby neighborhood and era, aggregating by mean
s_era=stats_era.groupby([stats_era['aneigh'],stats_era['era']]).mean()

#Create container for new GDFs
era_gdf_list=[]

#Create container for era labels
era_list=[]

#For each era...
for era in s_era.index.get_level_values(level='era'):
    #...capture the era...
    era_list.append(era)
    #...and capture a new era-specific GDF...
    era_gdf_list.append(gp.GeoDataFrame(nbhd.join(s_era.xs(era,level='era'))))

#Capture new era-specific GDFs in a dictionary
era_dict=dict(zip(era_list,era_gdf_list))

era_dict[1].head()
Out[304]:
DESCRIPTIO GIS_ID LABELB NBHD NBHD_NUM NBHD_TEXT NHDLNK OBJECTID SHAPE_AREA SHAPE_LEN geometry exp_tot year mean_iit mean_prop exp num_bbl 50%_iit 50%_prop era
aneigh
16th Street Heights 049 16th Street Heights nhood_049 None 049 49 49 049 19 0 0 POLYGON ((397753.9375005662441254 141723.29690... NaN 2006 0.034543 0.205481 NaN 0.343482 0.020916 0.238907 1
American University 001 American University nhood_001 None 001 1 1 001 4 0 0 POLYGON ((393362.3124991357326508 141606.35940... NaN 2006 0.045758 0.138915 NaN -0.022304 0.025021 0.135439 1
Anacostia 002 Anacostia nhood_002 None 002 2 2 002 5 0 0 POLYGON ((402129.8438016176223755 134197.10940... NaN 2006 0.021353 0.244339 NaN -0.135516 0.019955 0.207580 1
Anacostia Park 064 Anacostia Park nhood_064 None 064 64 64 064 59 0 0 POLYGON ((402293.9999981075525284 134723.28129... NaN 2006 NaN -0.078672 NaN NaN NaN -0.076647 1
Barry Farms 003 Barry Farms nhood_003 None 003 3 3 003 6 0 0 POLYGON ((399873.1874952167272568 133147.24999... NaN 2006 0.041114 0.184210 NaN 3.942210 0.019017 0.164324 1

It would also be usefult to know the economic potential of these neighborhoods to some extent. We can do this by assigning each neighborhood a quintile ranking pegged to the 2001 mean_iit value via a dictionary map from the neighborhood values in the index. (We could, of course, choose a different value for this filter if desired.) Note that we will want to color by quintile to add an informational dimension to the plots. We will, therefore, include the set of colors used directly into our newly augmented GDFs.

In [305]:
#Subset to 2001
stat01=stats.xs(2001,level='year')

#Capture quintile groupings
stat01['q_econ']=pd.qcut(stats.xs(2001,level='year')['mean_iit'],5)

#Generate dictionary mapping quintile groupings to clearer labels
q_dict=dict({'[4376.277, 24504.371]':'q1',
             '(24504.371, 35631.989]':'q2',
             '(35631.989, 61488.399]':'q3',
             '(61488.399, 109608.369]':'q4',
             '(109608.369, 916785.75]':'q5'})

#Generate a dictionary mapping colors to quintiles
qcolor_dict=dict({'q1':'#EDF8E9',
                  'q2':'#BAE4B3',
                  'q3':'#74C476',
                  'q4':'#31A354',
                  'q5':'#006D2C'})
             
#Assign new quintile labels
stat01['q_start']=stat01['q_econ'].map(q_dict)

#Assign colors
stat01['colors']=stat01['q_start'].map(qcolor_dict)

#Define new function to extreme code indicators of interest
def extreme(x):
    '''Caps percentage change on [-2,2] interval'''
    #Create container for new values
    x_new_list=[]
    
    #For each value...
    for i in range(len(x)):
        #...if less than -2 cap at -2...
        if x[i]<-2:
            x_new_list.append(-2.0)
        #...if between -2 and 2 leave as is...
        elif (x[i]>-2) & (x[i]<=2):
            x_new_list.append(x[i])
        #...otherwise cap at 2
        elif x[i]>2:
            x_new_list.append(2)
        else:
            x_new_list.append(0)
    return x_new_list

#Create new container for augmented era GDFs
era_list2=[]

#Define desired columns
era_cols=['q_start','colors']

#For df in era_dict...
for df in era_dict.values():
    #...add in q_start...
    df_new=pd.merge(df,stat01[era_cols],how='left',left_index=True,right_index=True)
    #...add in extreme coded values...
    df_new['mean_iit_ec']=extreme(df_new['mean_iit'])
    df_new['50%_iit_ec']=extreme(df_new['50%_iit'])
    df_new['mean_prop_ec']=extreme(df_new['mean_prop'])
    df_new['50%_prop_ec']=extreme(df_new['50%_prop'])
    
    #...and throw it in the list
    era_list2.append(df_new)

#Create new dict
era_dict2=dict(zip([1,2,3],era_list2))
    
era_dict2[1].head().T
Out[305]:
aneigh 16th Street Heights American University Anacostia Anacostia Park Barry Farms
DESCRIPTIO 049 16th Street Heights 001 American University 002 Anacostia 064 Anacostia Park 003 Barry Farms
GIS_ID nhood_049 nhood_001 nhood_002 nhood_064 nhood_003
LABELB None None None None None
NBHD 049 001 002 064 003
NBHD_NUM 49 1 2 64 3
NBHD_TEXT 49 1 2 64 3
NHDLNK 049 001 002 064 003
OBJECTID 19 4 5 59 6
SHAPE_AREA 0 0 0 0 0
SHAPE_LEN 0 0 0 0 0
geometry POLYGON ((397753.9375005662441254 141723.29690... POLYGON ((393362.3124991357326508 141606.35940... POLYGON ((402129.8438016176223755 134197.10940... POLYGON ((402293.9999981075525284 134723.28129... POLYGON ((399873.1874952167272568 133147.24999...
exp_tot NaN NaN NaN NaN NaN
year 2002.5 2002.5 2002.5 2004 2002.5
mean_iit 0.04671634 0.3501658 -0.1844525 NaN -0.05471703
mean_prop 0.2068368 0.1055571 -0.1410949 24.87695 -0.08580156
exp NaN NaN NaN NaN NaN
num_bbl 0.6166667 1.476732 0.1241586 NaN 1.715476
50%_iit 0.02774383 0.3575031 -0.1696829 NaN -0.03929195
50%_prop 0.2243305 0.07886029 -0.1506306 36.07052 -0.119842
era 0 0 0 0 0
q_start q3 q4 q1 NaN q1
colors #74C476 #31A354 #EDF8E9 NaN #EDF8E9
mean_iit_ec 0.04671634 0.3501658 -0.1844525 0 -0.05471703
50%_iit_ec 0.02774383 0.3575031 -0.1696829 0 -0.03929195
mean_prop_ec 0.2068368 0.1055571 -0.1410949 2 -0.08580156
50%_prop_ec 0.2243305 0.07886029 -0.1506306 2 -0.119842

First of all, what were the growth rankings in each of the eras? We can look at each era in turn, comparing adjusted gross income with assessment value.

In [306]:
def growth_compare(i,mean_med):
    #Generate plot space
    fig,axes=plt.subplots(1,2,sharey=True)
    #Create new GDF that excludes empty colors
    df_temp=era_dict2[i][era_dict2[i]['colors'].notnull()].sort(columns='mean_iit')
    #Identify color values
    colors=list(df_temp['colors'].values)
    #Plot AGI and assessment value
    df_temp[mean_med+'_iit_ec'].plot(kind='barh',ax=axes[0],color=colors,title=mean_med+' AGI Growth')
    df_temp[mean_med+'_prop_ec'].plot(kind='barh',ax=axes[1],color=colors,title=mean_med+' Assessment Value Growth')

growth_compare(1,'mean')
plt.suptitle('Average Annual Growth During Era #1 (2001-2004)',fontsize=20)

growth_compare(1,'50%')
plt.suptitle('Average Annual Growth During Era #1 (2001-2004)',fontsize=20)
Out[306]:
<matplotlib.text.Text at 0x595fd390>

Interesting. Early on there appeared to be little correlation between the growth rates in AGI and the growth rates in assessment value. It does appear, however, that there is some correlation between starting mean AGI and growth rate. Darker colors indicate higher quintiles.

Similar stories seem to arise in the median data as well. There are some curious deviations however. Neighborhoods like Cleveland Park, Georgetown, and Foggy Bottom experienced marginal growth in median AGI despite material increases in mean AGI. In contrast, Washington Navy Yard and DC Village saw dramatic increases in median income without corresponding increases mean income. Clearly the shape of the income profile has changed dramatically in some neighborhoods. Assessment values, however, appear to be characterized by more even growth.

How does the picture look in the second era?

In [307]:
growth_compare(2,'mean')
plt.suptitle('Average Annual Growth During Era #2 (2005-2007)',fontsize=20)

growth_compare(2,'50%')
plt.suptitle('Average Annual Growth During Era #2 (2005-2007)',fontsize=20)
Out[307]:
<matplotlib.text.Text at 0x59cefb90>

The second era is characterized by a substantial decline in variation, for both AGI and assessed value. Furthermore, of those experiencing positive growth in AGI, the magnitude of said growth decline. Assessed values, on the other hand, experienced virtually positive growth for all neighborhoods. The correlation between AGI growth and assessed value growth still appears to be weak at best.

Comparing the mean and median plots reveals distributions that, by and large, became more skewed towards higher incomes and assessment values during this period (as evidenced by mean growth outpacing median growth in most neighborhoods). This is perhaps unsurprising, fitting a national narrative about capital wealth appreciation in the lead up to the Recession.

What about the Great Recession and recovery era?

In [308]:
growth_compare(3,'mean')
plt.suptitle('Average Annual Growth During Era #3 (2008-2011)',fontsize=20)

growth_compare(3,'50%')
plt.suptitle('Average Annual Growth During Era #3 (2008-2011)',fontsize=20)
Out[308]:
<matplotlib.text.Text at 0x17ea1a90>

Growth all but vanishes in this time period, and the correlation between higher starting AGI and higher growth seems to have dissipated as well. Furthermore, aside from a sharp downward skew in Massachusetts Avenue Heights, the story is consistent across mean and median data.

Mapping these data would aid in identifying clustered growth activity. We can map both AGI and assessed value by era.

In [309]:
#Set plot dimensions
plt.rcParams['figure.figsize']=18,24

#Generate plotting object
fig,axes=plt.subplots(len(era_dict2),2)

#For each era...
for i in era_dict2:
    #...plot the distribution of mean_iit growth...
    gp.GeoDataFrame(era_dict2[i]).plot(column='mean_iit', scheme='QUANTILES', k=5, colormap='Reds',axes=axes[i-1,0])
    axes[i-1,0].set_title('Average Growth in mean_iit During Era #'+str(i))
    axes[i-1,0].patch.set_facecolor('white')
    axes[i-1,0].set_axis_off()
    #...and the distribution of mean_prop growth
    gp.GeoDataFrame(era_dict2[i]).plot(column='mean_prop', scheme='QUANTILES', k=5, colormap='Reds',axes=axes[i-1,1])
    axes[i-1,1].set_title('Average Growth in mean_prop During Era #'+str(i))
    axes[i-1,1].patch.set_facecolor('white')
    axes[i-1,1].set_axis_off()

plt.tight_layout()

Note that the colors here do not correspond to the same groups defined above (by mean AGI quintiles in 2001). Rather, they are quintile ranges measured with the data from each era's GDF. This is the primary reason for the shift in color.

Relative income growth continues to shift from dense concentrations in the affluent northwest to pockets located in the center of the District. The eastward shift in property growth is far more dramatic. The highest growth values occur largely in the residential districts to the east of Brookland in later years.

One might wonder whether or not the median data corroborate this story.

In [310]:
#Generate plotting object
fig,axes=plt.subplots(len(era_dict2),2)

#For each era...
for i in era_dict2:
    #...plot the distribution of mean_iit growth...
    gp.GeoDataFrame(era_dict2[i]).plot(column='50%_iit', scheme='QUANTILES', k=5, colormap='Reds',axes=axes[i-1,0])
    axes[i-1,0].set_title('Average Growth in 50%_iit During Era #'+str(i))
    axes[i-1,0].patch.set_facecolor('white')
    axes[i-1,0].set_axis_off()
    #...and the distribution of mean_prop growth
    gp.GeoDataFrame(era_dict2[i]).plot(column='50%_prop', scheme='QUANTILES', k=5, colormap='Reds',axes=axes[i-1,1])
    axes[i-1,1].set_title('Average Growth in 50%_prop During Era #'+str(i))
    axes[i-1,1].patch.set_facecolor('white')
    axes[i-1,1].set_axis_off()

plt.tight_layout()

Indeed they do, but the eastward trend appears more pronounced on the income side, while the median assessment value growth activity has a distinctly southern flavor.

Are Investment Neighborhoods Different?

In many ways, the above analysis provides empirical support to a narrative familiar to ORA staff. The income distribution in DC is shifting spatially. New real estate opportunities are being exploited by young professionals in historically less developed neighborhoods, which disproportionately reside on the eastern side of the District. We are seeing income and property value movement that is largely consistent with a shift in the locus of economic activity.

While this is generally interesting, the motivation for this exploration is the analysis of the impact of transformative investment. It would thus be useful to explore the correlation between public investment and growth in our indicators. To be specific, we need to be able to directly compare levels of public investment and growth in AGI and assessment value. We already have the growth values, but we lost our levels during the procedure used to calculate the growth values. Consequently, we need to calculate average investment for each neighborhood in each era. To preserve the ability to compare past investment to future growth, we will capture all of the era investment averages in one DF and then merge them with the growth indicators from each of the three eras.

In [311]:
#Create copy of stats, but capturing levels this time
inv_era=stats.reset_index().copy()

#Recode years to eras
inv_era['era']=inv_era['year'].map(era_map_dict)

#Group investment by neighborhood and era, aggregating by mean
i_era=inv_era['exp'][inv_era['era'].notnull()].groupby([inv_era['aneigh'],inv_era['era']]).mean().fillna(0)

#Group business licenses by neighborhood and era, aggregating by sum
bbl_era=inv_era['num_bbl'][inv_era['era'].notnull()].groupby([inv_era['aneigh'],inv_era['era']]).sum().fillna(0)

#Combine investment and BBL data
inv_bbl_era=DataFrame(i_era).rename(columns={0:'inv_era'}).join(DataFrame(bbl_era).rename(columns={0:'bbl_era'}))

#Unstack to hold all eras with a single neighborhood reference
ib_era=inv_bbl_era.unstack(level='era')

#Rename columns
ib_era.columns=['inv1','inv2','inv3','bbl1','bbl2','bbl3']

ib_era.head(20)
Out[311]:
inv1 inv2 inv3 bbl1 bbl2 bbl3
aneigh
16th Street Heights 0.00 0.000000 0.0 94 218 439
American University 0.00 0.000000 0.0 88 140 352
Anacostia 0.00 0.000000 0.0 280 264 815
Anacostia Park 0.00 0.000000 0.0 0 0 2
Barry Farms 0.00 0.000000 0.0 33 259 239
Berkley 0.00 0.000000 0.0 4 21 39
Bolling Air Force Base 0.00 0.000000 0.0 7 15 5
Brentwood 0.00 0.000000 1200000.0 171 363 649
Brightwood 0.00 0.000000 0.0 148 321 654
Brookland 0.00 0.000000 0.0 283 565 1210
Burleith 0.00 0.000000 0.0 10 117 112
Capitol Hill 0.00 0.000000 0.0 99 405 771
Central-tri 1 0.00 0.000000 0.0 612 846 2747
Central-tri 3 18496552.75 1044743.666667 5246265.5 334 970 2390
Chevy Chase 0.00 0.000000 0.0 89 168 466
Chillum 0.00 0.000000 0.0 69 101 222
Cleveland Park 0.00 0.000000 0.0 21 177 557
Colonial Village 0.00 0.000000 0.0 2 5 36
Columbia Heights 513868.50 11223061.000000 11797136.5 372 874 1854
Congress Heights 0.00 0.000000 0.0 178 568 1182

With the investment and BBL information compiled by neighborhood and era, we can merge it back in with the existing data.

In [312]:
#Create container for new GDFs
era_gdf_list2=[]

#Create container for era labels
era_list2=[]

#For each era...
for era in era_dict2.keys():
    #...capture the era...
    era_list2.append(era)
    #...and capture a new era-specific GDF...
    era_gdf_list2.append(gp.GeoDataFrame(era_dict2[era].join(ib_era)))

#Capture new era-specific GDFs in a dictionary
era_dict3=dict(zip(era_list2,era_gdf_list2))

era_dict3[3].head().T
Out[312]:
aneigh 16th Street Heights American University Anacostia Anacostia Park Barry Farms
DESCRIPTIO 049 16th Street Heights 001 American University 002 Anacostia 064 Anacostia Park 003 Barry Farms
GIS_ID nhood_049 nhood_001 nhood_002 nhood_064 nhood_003
LABELB None None None None None
NBHD 049 001 002 064 003
NBHD_NUM 49 1 2 64 3
NBHD_TEXT 49 1 2 64 3
NHDLNK 049 001 002 064 003
OBJECTID 19 4 5 59 6
SHAPE_AREA 0 0 0 0 0
SHAPE_LEN 0 0 0 0 0
geometry POLYGON ((397753.9375005662441254 141723.29690... POLYGON ((393362.3124991357326508 141606.35940... POLYGON ((402129.8438016176223755 134197.10940... POLYGON ((402293.9999981075525284 134723.28129... POLYGON ((399873.1874952167272568 133147.24999...
exp_tot NaN NaN NaN NaN NaN
year 2009.5 2009.5 2009.5 2009 2009.5
mean_iit 0.001796586 0.02342792 0.02161648 NaN 0.04728481
mean_prop 0.02011667 0.02658858 0.08858267 0.06899296 0.05361671
exp NaN NaN NaN NaN NaN
num_bbl 0.3977771 0.2475023 0.9001058 -0.9565217 -0.105701
50%_iit 0.003418689 0.02206532 0.01732333 NaN 0.0269894
50%_prop 0.003246311 0.007945699 0.0633747 0.008518167 0.07127914
era 2 2 2 2 2
q_start q3 q4 q1 NaN q1
colors #74C476 #31A354 #EDF8E9 NaN #EDF8E9
mean_iit_ec 0.001796586 0.02342792 0.02161648 0 0.04728481
50%_iit_ec 0.003418689 0.02206532 0.01732333 0 0.0269894
mean_prop_ec 0.02011667 0.02658858 0.08858267 0.06899296 0.05361671
50%_prop_ec 0.003246311 0.007945699 0.0633747 0.008518167 0.07127914
inv1 0 0 0 0 0
inv2 0 0 0 0 0
inv3 0 0 0 0 0
bbl1 94 88 280 0 33
bbl2 218 140 264 0 259
bbl3 439 352 815 2 239

We can now compare investment in the three eras. Colorized correlation plots may be helpful.

Note that while these plots will be for primary use, there are several more detailed scatter plots and correlation matrices in the Appendix. For each era, we will evaluate the correlation between our responses (AGI and assessment value growth) and investment that occurs in the current and prior eras. Thus, for the first era responses, scatters will evaluate only first era investment. For the second era responses, scatters evaluate both the second and first era investments. In similar fashion, the third era responses are plotted against all three.

Getting back to the plots that immediately follow, unfortunately, Python sometimes makes very simple things difficult. In this case, while plotting the correlation matrix as a psuedocolors is relatively straightforward, centering diverging colors on zero is not. Why do we want to center on zero? It aids in interpretation if one color is always positive and one is always negative.

To make this happen in Python, we always have the option of explicitly defining color ramps for our particular data set. A more robust option is to define a new subclass that inherits from matplotlib.colors.Normalize. I guess I had to dive into custom classes at some point...

In [313]:
from matplotlib.colors import Normalize

class MidpointNormalize(Normalize):
    def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
        self.midpoint = midpoint
        Normalize.__init__(self, vmin, vmax, clip)

    def __call__(self, value, clip=None):
        # I'm ignoring masked values and all kinds of edge cases to make a
        # simple example...
        x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
        return np.ma.masked_array(np.interp(value, x, y))

This subclass option is the only method that allows for continuous color mapping, something we are more often interested in than not.

In [314]:
#Define desired columns
cor_cols1=['mean_iit','mean_prop','50%_iit','50%_prop','inv1']

#Define first era subset
e1_sub=era_dict3[1][cor_cols1].dropna()

#Generate correlation matrix (transposing captures order provided in the DF)
e1_corr=np.corrcoef(e1_sub.T)

#Set plot size
plt.rcParams['figure.figsize']=13,10

#Construct subplot
fig,ax=plt.subplots()

#Define midpoint object
norm=MidpointNormalize(midpoint=0)

#Generate plot object
e1c=ax.imshow(e1_corr,norm=norm,cmap='RdBu',interpolation='none')

#Add color bar
fig.colorbar(e1c)

#Set tick positions
labelPos=np.arange(0,len(e1_sub.columns))

#Set tick labels
pylab.xticks(labelPos,e1_sub.columns)
pylab.yticks(labelPos,e1_sub.columns)
pylab.title('Correlation Plot in Era #1 (2001-2004)');
In [315]:
#Define desired columns
cor_cols2=['mean_iit','mean_prop','50%_iit','50%_prop','inv1','inv2']

#Define second era subset
e2_sub=era_dict3[2][cor_cols2].dropna().replace(inf,0)

#Generate correlation matrix (transposing captures order provided in the DF)
e2_corr=np.corrcoef(e2_sub.T)

#Set plot size
plt.rcParams['figure.figsize']=13,10

#Construct subplot
fig,ax=plt.subplots()

#Generate plot object
e2c=ax.imshow(e2_corr,norm=norm,cmap='RdBu',interpolation='none')

#Add color bar
fig.colorbar(e2c)

#Set tick positions
labelPos2=np.arange(0,len(e2_sub.columns))

#Set tick labels
pylab.xticks(labelPos2,e2_sub.columns)
pylab.yticks(labelPos2,e2_sub.columns)
pylab.title('Correlation Plot in Era #2 (2005-2007)')
Out[315]:
<matplotlib.text.Text at 0x3de7d710>
In [316]:
#Define desired columns
cor_cols3=['mean_iit','mean_prop','50%_iit','50%_prop','inv1','inv2','inv3']

#Define third era subset
e3_sub=era_dict3[3][cor_cols].dropna()

#Generate correlation matrix (transposing captures order provided in the DF)
e3_corr=np.corrcoef(e3_sub.T)

#Set plot size
plt.rcParams['figure.figsize']=13,10

#Construct subplot
fig,ax=plt.subplots()

#Generate plot object
e3c=ax.imshow(e3_corr,norm=norm,cmap='RdBu',interpolation='none')

#Add color bar
fig.colorbar(e3c)

#Set tick positions
labelPos3=np.arange(0,len(e3_sub.columns))

#Set tick labels
pylab.xticks(labelPos3,e3_sub.columns)
pylab.yticks(labelPos3,e3_sub.columns)
pylab.title('Correlation Plot in Era #3 (2008-2011)')
Out[316]:
<matplotlib.text.Text at 0x592c06d0>

Comparison of the correlation plots over the three periods reveals remarkably different behavior across time periods. It suggests that the association between the variables presented here is conditional on outside factors. This is most noticeable in the third era, with the general level of positive correlation increasing markedly. There are also a few particular items of note:

  1. In the first era, mean and median assessment value growth is highly correlated. While mean and median income is also positively correlated, it is to a much lesser extent.
  2. In the second era, mean and median income are correlated in a way similar to the first era. In contrast, mean and median assessment value are now moving in opposite directions.
  3. In the first two eras, investment levels are weakly or outright negatively correlated with growth in the income and property tax bases. This picture changes in the third era to a weak or somewhat postivie correlation.
  4. The most significant investment correlation appears in the third era. Average assessment value growth is moderately correlated with investment in the previous time period.

At this point, we can look at average growth rates in investment neighborhoods relative to the other neighborhoods in each era. For our purposes, we will view the existence of investment as a binary treatment. Any non-zero investment level neighborhoods go into the treatment group, while all remaining neighborhoods constitute the control. The first step is parsing out these groups for each era.

In [317]:
#Establish treatment groups by era
e1_treat=era_dict3[1]['inv1'][era_dict3[1]['inv1']>0].index
e2_treat=era_dict3[2]['inv2'][era_dict3[2]['inv2']>0].index
e3_treat=era_dict3[3]['inv3'][era_dict3[3]['inv3']>0].index

#Establish control groups by era
e1_control=era_dict3[1]['inv1'][era_dict3[1]['inv1']==0].index
e2_control=era_dict3[2]['inv2'][era_dict3[2]['inv2']==0].index
e3_control=era_dict3[3]['inv3'][era_dict3[3]['inv3']==0].index

For each of these groups, we need to compare the average growth rates within each era.

In [318]:
#Capture desired columns
comp_cols=['mean_iit','mean_prop','50%_iit','50%_prop']

#Capture average growth rates for treatment and control groups in a single DF
e1_comp=DataFrame(era_dict3[1].ix[e1_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[1].ix[e1_control][comp_cols].mean())).rename(columns={0:'control'})
e2_comp=DataFrame(era_dict3[2].ix[e2_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[2].ix[e2_control][comp_cols].mean())).rename(columns={0:'control'})
e3_comp=DataFrame(era_dict3[3].ix[e3_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[3].ix[e3_control][comp_cols].mean())).rename(columns={0:'control'})

#Plot comparisons for each era
plt.rcParams['figure.figsize']=15,20
fig,axes=plt.subplots(3)
e1_comp.plot(kind='bar',ax=axes[0],title='Growth Rate Comparisons - First Era')
e2_comp.plot(kind='bar',ax=axes[1],title='Growth Rate Comparisons - Second Era')
e3_comp.plot(kind='bar',ax=axes[2],title='Growth Rate Comparisons - Third Era')
plt.tight_layout()

Wow, that's interesting. For almost every comparison, the control group grew faster. This smacks of selection bias in the identification of projects. That guy is fighting with spatial resolution to become public enemy #1.

It would be useful to capture the same plots for lagged investment scenarios:

  1. Era 2 data with Era 1 groups;
  2. Era 3 data with Era 1 groups; and,
  3. Era 3 data with Era 2 groups.

These plots should allow us to see distancing that may occur after the investment has had a few years to take an effect.

In [319]:
#Capture average growth rates for treatment and control groups in a single DF
e2L1_comp=DataFrame(era_dict3[2].ix[e1_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[2].ix[e1_control][comp_cols].mean())).rename(columns={0:'control'})
e3L1_comp=DataFrame(era_dict3[3].ix[e1_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[3].ix[e1_control][comp_cols].mean())).rename(columns={0:'control'})
e3L2_comp=DataFrame(era_dict3[3].ix[e2_treat][comp_cols].mean()).rename(columns={0:'treatment'}).\
            join(DataFrame(era_dict3[3].ix[e2_control][comp_cols].mean())).rename(columns={0:'control'})

#Plot comparisons for each era
plt.rcParams['figure.figsize']=15,20
fig,axes=plt.subplots(3)
e2L1_comp.plot(kind='bar',ax=axes[0],title='Lagged Growth Rate Comparisons - First Era Groups & Second Era Data')
e3L1_comp.plot(kind='bar',ax=axes[1],title='Lagged Growth Rate Comparisons - First Era Groups & Third Era Data')
e3L2_comp.plot(kind='bar',ax=axes[2],title='Lagged Growth Rate Comparisons - Second Era Groups & Third Era Data')
plt.tight_layout()

It appears that in the second era (2005-2007), the treatment group neighborhoods did make some progress relative to the control groups. This was, however, quickly and substantially, reversed in the third time period.

Appendix

Enter those more detailed scatters I alluded to above. Note that the initial plots are interactive. The zoom and pan tools are especially useful.

In [320]:
output_notebook()

Configuring embedded BokehJS mode.

Era #1 (2001-2004)

Income Response

In [321]:
scatter(era_dict3[1]['mean_iit_ec'],era_dict3[1]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_current_inv',title='Mean IIT(EC) vs Current Investment')
Plots
Out[321]:
<bokeh.objects.Plot at 0x5bdb2910>
In [322]:
scatter(era_dict3[1]['50%_iit_ec'],era_dict3[1]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_current_inv',title='Median IIT(EC) vs Current Investment')
Plots
Out[322]:
<bokeh.objects.Plot at 0x5a4faed0>

Assessment Value Response

In [323]:
scatter(era_dict3[1]['mean_prop_ec'],era_dict3[1]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_current_inv',title='Mean AV(EC) vs Current Investment')
Plots
Out[323]:
<bokeh.objects.Plot at 0x42e22b10>
In [324]:
scatter(era_dict3[1]['50%_prop_ec'],era_dict3[1]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_current_inv',title='Median AV(EC) vs Current Investment')
Plots
Out[324]:
<bokeh.objects.Plot at 0x2ac16510>

Well these plots certainly do little to support the impact of investment. That being said, they show only the impact of current investment when we expect transformative investments will take some time to take hold. We can begin to explore this effect in the next era.

Era #2 (2005-2007)

Income Effect

In [325]:
scatter(era_dict3[2]['mean_iit_ec'],era_dict3[2]['inv2'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_current_inv2',title='Mean IIT(EC) vs Current Investment')
Plots
Out[325]:
<bokeh.objects.Plot at 0x463bce10>
In [326]:
scatter(era_dict3[2]['mean_iit_ec'],era_dict3[2]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_L1_inv2',title='Mean IIT(EC) vs L1 Investment')
Plots
Out[326]:
<bokeh.objects.Plot at 0x499de1d0>
In [327]:
scatter(era_dict3[2]['50%_iit_ec'],era_dict3[2]['inv2'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_current_inv2',title='Median IIT(EC) vs Current Investment')
Plots
Out[327]:
<bokeh.objects.Plot at 0x42de6fd0>
In [328]:
scatter(era_dict3[2]['50%_iit_ec'],era_dict3[2]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_L1_inv2',title='Median IIT(EC) vs L1 Investment')
Plots
Out[328]:
<bokeh.objects.Plot at 0x5bff8b50>

Assessment Value Response

In [329]:
scatter(era_dict3[2]['mean_prop_ec'],era_dict3[2]['inv2'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_current_inv2',title='Mean AV(EC) vs Current Investment')
Plots
Out[329]:
<bokeh.objects.Plot at 0x5bdaa210>
In [330]:
scatter(era_dict3[2]['mean_prop_ec'],era_dict3[2]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_L1_inv2',title='Mean AV(EC) vs L1 Investment')
Plots
Out[330]:
<bokeh.objects.Plot at 0x2cb39490>
In [331]:
scatter(era_dict3[2]['50%_prop_ec'],era_dict3[2]['inv2'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_current_inv2',title='Median AV(EC) vs Current Investment')
Plots
Out[331]:
<bokeh.objects.Plot at 0x515ab6d0>
In [332]:
scatter(era_dict3[2]['50%_prop_ec'],era_dict3[2]['inv1'],\
        fill_color=era_dict3[1]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_L1_inv2',title='Median AV(EC) vs L1 Investment')
Plots
Out[332]:
<bokeh.objects.Plot at 0x5941e710>

Wow, nothing compelling here. While it is good to see that investment areas generally have positive growth, so do most neighborhoods. Furthermore, some areas without investment (and in some cases low starting endowments) are outpacing those with investment.

Era #3 (2008-2011)

Income Response

In [333]:
scatter(era_dict3[3]['mean_iit_ec'],era_dict3[3]['inv3'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_current_inv3',title='Mean IIT(EC) vs Current Investment')
Plots
Out[333]:
<bokeh.objects.Plot at 0x55cc18d0>
In [334]:
scatter(era_dict3[3]['mean_iit_ec'],era_dict3[3]['inv2'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_L1_inv3',title='Mean IIT(EC) vs L1 Investment')
Plots
Out[334]:
<bokeh.objects.Plot at 0x4ff79250>
In [335]:
scatter(era_dict3[3]['mean_iit_ec'],era_dict3[3]['inv1'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_iit_ec_vs_L2_inv3',title='Mean IIT(EC) vs L2 Investment')
Plots
Out[335]:
<bokeh.objects.Plot at 0x513996d0>
In [336]:
scatter(era_dict3[3]['50%_iit_ec'],era_dict3[3]['inv3'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_current_inv3',title='Median IIT(EC) vs Current Investment')
Plots
Out[336]:
<bokeh.objects.Plot at 0x5bdaa290>
In [337]:
scatter(era_dict3[3]['50%_iit_ec'],era_dict3[3]['inv2'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_L1_inv3',title='Median IIT(EC) vs L1 Investment')
Plots
Out[337]:
<bokeh.objects.Plot at 0x46913390>
In [338]:
scatter(era_dict3[3]['50%_iit_ec'],era_dict3[3]['inv1'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_iit_ec_vs_L2_inv3',title='Median IIT(EC) vs L2 Investment')
Plots
Out[338]:
<bokeh.objects.Plot at 0x5bd94290>

Assessment Value Response

In [339]:
scatter(era_dict3[3]['mean_prop_ec'],era_dict3[3]['inv3'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_current_inv3',title='Mean AV(EC) vs Current Investment')
Plots
Out[339]:
<bokeh.objects.Plot at 0x58c7dd10>
In [340]:
scatter(era_dict3[3]['mean_prop_ec'],era_dict3[3]['inv2'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_L1_inv3',title='Mean AV(EC) vs L1 Investment')
Plots
Out[340]:
<bokeh.objects.Plot at 0x57d8e710>
In [341]:
scatter(era_dict3[3]['mean_prop_ec'],era_dict3[3]['inv1'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='mean_prop_ec_vs_L2_inv3',title='Mean AV(EC) vs L2 Investment')
Plots
Out[341]:
<bokeh.objects.Plot at 0x5bff8c10>
In [342]:
scatter(era_dict3[3]['50%_prop_ec'],era_dict3[3]['inv3'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_current_inv3',title='Median AV(EC) vs Current Investment')
Plots
Out[342]:
<bokeh.objects.Plot at 0x5cde3510>
In [343]:
scatter(era_dict3[3]['50%_prop_ec'],era_dict3[3]['inv2'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_L1_inv3',title='Median AV(EC) vs L1 Investment')
Plots
Out[343]:
<bokeh.objects.Plot at 0x5cde3250>
In [344]:
scatter(era_dict3[3]['50%_prop_ec'],era_dict3[3]['inv1'],\
        fill_color=era_dict3[3]['colors'],fill_alpha=0.6,
        name='med_prop_ec_vs_L2_inv3',title='Median AV(EC) vs L2 Investment')
Plots
Out[344]:
<bokeh.objects.Plot at 0x5b799c50>

Well, this supports the notion that early detection of tax base response to investment is a less than straightforward exercise. I used bokeh plots to enable closer inspection (with pan & zoom control), but perhaps a more concise view would be helpful.

Here is the first era (2001-2004).

In [345]:
#Define columns to insert into the matrix
cor_cols=['mean_iit','mean_prop','50%_iit','50%_prop','inv1','inv2','inv3']

scatter_matrix(era_dict3[1][cor_cols],alpha=.4,figsize=(20,20),diagonal='kde');

The second era (2005-2007)...

In [346]:
scatter_matrix(era_dict3[2][cor_cols].dropna().replace(inf,0),alpha=.4,figsize=(20,20),diagonal='kde');

...and the third era (2008-2011)

In [347]:
scatter_matrix(era_dict3[3][cor_cols],alpha=.8,figsize=(20,20),diagonal='kde');
In [347]: